Rank | Count | Beginning |
---|---|---|
14 | 454 | Uthe |
47 | 152 | Lo |
232 | 137 | Lokhu |
74 | 127 | Okhulumela |
32 | 97 | Uma |
91 | 92 | Le |
1 | 90 | ” |
164 | 78 | INLSA |
56 | 71 | Kuthiwa |
188 | 68 | UMnuz |
48 | 67 | Ngesikhathi |
266 | 58 | Uthi |
5 | 56 | UNKSZ |
212 | 54 | Abantu |
16 | 51 | Lesi |
199 | 51 | Leli |
25 | 49 | Yize |
260 | 49 | Amaphoyisa |
89 | 48 | UNKK |
39 | 42 | " |
712 | 42 | Ngemuva |
3 | 38 | Laba |
21 | 36 | Njengoba |
142 | 36 | Okunye |
194 | 36 | Ngokusho |
248 | 36 | Abanye |
499 | 35 | Uqhube |
27 | 33 | Kodwa |
249 | 32 | Omunye |
558 | 32 | Ngaphandle |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV